# DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING UNIVERSITY OF BRITISH COLUMBIA CPEN 211 Introduction to Microcomputers, Fall 2016

Lab 5: Datapath of the "Simple RISC Machine"

L1E: handin deadline is 11:59 PM Oct 12; Demos are on Oct 13th
L1C: handin deadline is 11:59 PM Oct 13; Demos are on Oct 14th
L1A, L1B, L1D: Handin deadline is 5:00 PM Oct 17; Demos are on Oct 18th
All sections: Lab 6 is the week of Oct 24-28

#### 1 Introduction

Starting with this lab and continuing in the next two you will build a computer that implements a "Reduced Instruction Set Computer" (RISC). The chips that power iPhone and Android smartphones include a computer using a RISC "instruction set architecture" (ISA) known as ARM. We will be learn about ARM in the second half of CPEN 211 including how to program it at the level of software that is "closest" to the hardware. To this end, Labs 5, 6 and 7 are an important bridge between the first and second half of the course. By completing these labs, which are both fun and perhaps a little challenging as well, you will find it much easier to learn assembly coding during the second half of the course. You will also have taken advantage of a great opportunity to develop practical engineering development skills—notably, quickly and efficiently finding design errors in complex systems (a *very* highly valued skill in industry). These labs show you how a computer works "under the hood", which is knowledge that will be valuable to you regardless of which direction you pursue in ECE. You will find this lab far more pleasant and it will take less time if you can find and fix any bugs in your Verilog *quickly*. Hence, you should both review the debugging video https://youtu.be/2c3CZouKJKs and read this complete handout *before* you get started.

## 1.1 From C to Assembly

The "Simple RISC Machine" executes programs written using a *very* small set of "instructions". Instructions are the smallest unit of computation that a programmer can specify. Section 8 at the end of this handout provides a complete list of the instructions your computer will execute at end of Lab 7 assuming you manage to do all parts of Labs 5 through 7. This set of instructions is "Turing-complete" meaning your Simple RISC Machine computer could in principle run any program given enough time and enough memory.

Consider the following line of C code:

$$f = (q + h) - (i + j);$$

To execute this C code on the Simple RISC Machine architecture we will see it is necessary for a compiler—or CPEN 211 student—to first divide the computation into three separate "instructions". Each of the three instructions will specify a small portion of the above computation. These steps are analogous to steps in a cookbook recipe. For example, to bake bread you might follow the steps: 1. Mix flour, water and yeast; 2. Let dough rise; 3. Bake in oven. Each step in the recipe is analogous to an instruction. We can implement the C code above using the following three instructions:

The first instruction, "ADD t1, g, h", adds the value of the variables "g" and "h" and puts them into the *temporary variable* "t1". You probably noticed that "t1" was not a part of our original C program. We used this temporary variable to help us split up our long line of C code into smaller steps. Similarly, the second line adds the variables "i" and "j". Finally, the third line subtracts the temporary variable t2 from t1 and puts the result in f.

To implement a computer that can execute these instructions we will need a block of hardware that can add and subtract numbers. We also need some hardware that can store the numbers before and after the addition and subtraction operations. The following section shows one possible way to implement a processor that can execute not only the instructions above, but a few others that we will introduce later.

## 1.2 Overview of the Simple RISC Machine Datapath

To execute these instructions, in Lab 5, you will implement a computer *datapath* like that shown in Figure 1. This section introduces and briefly explains the different elements in Figure 1. While reading this section focus on understanding what the individual blocks in this figure do and what other blocks they are connected to rather than trying to understand *why* they are connected the way they are. To understand how all the blocks work together we will subsequently "walk through" an example in Section 2.

The datapath consists of one register file 1 containing 8 registers, each holding 16-bits; three multiplexers 679; three 16-bit registers with load enable 345; a 1-bit register with load enable 10; a shifter unit 8; and an arithmetic logic unit (ALU) 2. Below we describe each of the datapath components in detail the return to our assembly example from Section 1.1.

#### 1.2.1 Register File

To compute with the variables g, h, i, j, t1, and t2, our computer needs a way to "remember" their values. To remember their values, the Simple RISC Machine employs the hardware structure known as a *register file* 1, shown in Figure 1.

A register file is a small memory. A *memory* is a hardware block that remembers information. It does this by storing the information as numbers consisting of a fixed number of bits in a fixed number of *locations*. Each location has associated with it an *address*, which is very much like the street address of a house. To record a number, say 42, for later use we need to present that number to the memory along with the address of the location it should be placed, which might be 3. Think of dropping off a friend, who has the student number "42" at their home, which has the street number "3". Here, "42" is the information we want to remember and 3 is the address of the location we will use to remember it. To use the information we saved in memory later we need only present the memory with the location number, e.g., 3 and the memory will return the value 42. We could have stored any value provided it fits in a given memory location. Typically, a memory has a large number of locations. For example, if your computer has 4GB of RAM, this means it has 4 billion locations (the "G" means giga, which stands for one billion), each of which stores an 8-bit value, also known as a *byte* of information (byte is abbreviated to "B").

The register file for the Simple RISC Machine is a memory that has eight locations numbered 0 through 7, each of which can hold a 16-bit number. Notice that to specify one of the eight locations requires only  $log_2(8)=3$ -bits, even though each location can hold 16-bits. The eight different locations are more commonly referred to as R0, R1, ..., R7, where the R stands for "register". Thus, the entire register file can store  $8\times 16=128$  bits of information. An *individual* 16-bit register inside the register file is built using what is sometimes referred to as a *register with load enable*. A register with load enable can be implemented with the circuit shown in Figure 2, which you should know how to build based upon the material you will have seen in lecture by the time we reach the end of Slide Set 5 in lecture on Tuesday Oct 3.

Referring to Figure 2, when the input load is 0, the 16-bit value out is passed through the top input of the 2-input binary-select multiplexer back into the 16-bit D input. The effect of this is that when load is 0 and there is a rising edge of the clock, the value stored in the register does *not* change. On the other hand, when load is 1 the value of out is updated to the value on in on the rising edge of the clock. In Figure 1 a 16-bit register with load enable is represented using the symbol shown in Figure 3.

To implement the register file, you will use eight separate instances of the register with load enable circuit. The overall logic diagram for the complete register file is shown in Figure 4. You will notice this diagram includes two *decoder* blocks. A decoder is a combinational logic circuit which you will learn about

CPEN 211 - Lab 5 2 of 11



Figure 1: "Simple RISC Machine" Datapath



Figure 2: Register with Load Enable Circuit

when we cover the start of Slide Set 7, which we will either be Tuesday Oct 4 or Thursday Oct 6th. To keep it readable, Figure 4 does not show all locations (e.g., R4, R5 and R6 are omitted).

To store the value of a variable into the register file, we need to pick one of the eight 16-bit registers with load enable. In APSC 160 this choice was made for you by your C compiler. In Lab 5, 6 and 7, you will need to make this choice yourself. For example, we could decide that for our example program we will put "q" in R0, "h" in R1, "i" in R2, and "j" in R3. Thus, our program now looks like:

ADD t1, R0, R1; ADD t2, R2, R3; SUB f, t1, t2;

Now, for this example you still need to allocate registers in the register file for t1, t2 and f. Let's use R4 and R5 to hold t1 and t2. What about "f"? Well, you could use R6, but you are starting to run very low on "free" registers. If the program contains only the one line of C code it does not need to hold onto "g" after the first instruction so you can reuse R0 for "f". After making these register "allocation decisions" the program looks like the following:

CPEN 211 - Lab 5 3 of 11



Figure 3: Register with Load Enable Symbol (Note: this is used in both Figure 1 and 4)



Figure 4: Register File Internal Structure

```
ADD R4, R0, R1;
ADD R5, R2, R3;
SUB R0, R4, R5;
```

Now, let's consider how the value of variable "j" gets into and out of R3 inside the register file.

Suppose we want "j" to have the value 42 *before* we start executing our code. This is like dropping our friend off at their home in the analogy used earlier. To put 42 in R3 you would place the 16-bit value 000000000101010 (binary for 42) on data\_in, set the 3-bit input write\_num to 011 (binary for 3), set write to 1 to indicate we wish to save, or *write*, the value 42 into location 3 in the register file for later, and input a rising edge on clk. This causes, the output of the upper 3:8 decoder to be driven to 00001000. Each output bit of the decoder is AND'ed with write. As write is 1, the load input to R3 is set to 1. On the rising edge of clk 000000000101010 will be copied to the 16-bit Q output of R3. At most one load enable input to R0 through R7 will be 1. If write is 0 all 8 load-enable signals are 0. Section 2 describes in more detail the "MOV" instruction that is used to write a constant into a register using the above steps.

To recall, or *read*, the value of "j", which is now stored in R3, we set the 3-bit bus readnum to 011. The 8-bit output of the lower 3:8 decoder will be 00001000 and this will cause the 8-input one-hot select mux to copy the value of R3 to data\_out.

This register file is said to have two *ports*: one *write* port and one *read* port. Thus, up to one write and one read can be performed simultaneously. The read is combinational: whenever readnum changes, the value from the indicated register is driven out of the register file after some combinational delay. THE REGISTER READS ARE NOT COORDINATED TO THE CLOCK! The register write, however, is coordinated to the

CPEN 211 - Lab 5 4 of 11

| Value on ALUop input | Operation |
|----------------------|-----------|
| 00                   | Ain + Bin |
| 01                   | Ain - Bin |
| 10                   | Ain & Bin |
| 11                   | Ain       |

Table 1: ALU operations

clock. At each rising clock edge, if write is 1, the value on the 16-bit register file input data\_in is written into the register indicated by the value on writenum. This write only happens on the rising clock edge. If, at the clock edge, write is 0, no register is updated. Be sure you follow the Verilog style guidelines for your register file code.

#### 1.2.2 Arithmetic Logic Unit

The ALU can perform arithmetic or logical operations. This is the main piece of hardware that actual "computes" things inside a computer. With the exception of the shifter unit, you will notice all of the other circuitry in Lab 5, 6 and 7 is used to remember values or get values into and out of the ALU. Which operation should be performed by your ALU is indicated by the value on the ALUop input shown in Table 1.

Note that it is important you use the values in the above table or in Lab 6 and 7 the assembler tool we give you will generate code that does not work with your Simple RISC Machine. Note that the ALU is purely combinational; there is no clock input. Whenever one of the inputs or ALUop lines change, the output changes appropriately (the Verilog "+" and "-" operations are combinational).

### 1.2.3 Pipeline Registers

The datapath contains three 16-bit registers 345 with load enable that are not included in the register file. These hold the datapath signals A, B, and C. We will use these registers while executing an individual instruction. We need at least one of the two registers A and B because the ALU is purely combinational and the register file can read out only one of R0 through R7 at a time. You may want to try to eliminate the other two registers in Lab 7 but for now you should keep them.

#### 1.2.4 Source Operand Multiplexers

To enable more complex instructions besides addition and subtraction it is helpful if we can change the inputs to the ALU via the source operand multiplexers 67. For some of the Simple RISC Machine instructions you will add in Lab 6 and 7 you will want to set the 16-bit Ain input to the ALU to zero. For other instructions you will want to use a so-called "immediate operand", which will be described later.

#### 1.2.5 Shifter Unit

Some instructions are made more powerful with the ability to quickly multiply one of the inputs to the ALU by a power of 2 or perform integer divide by a power of 2. The shifter unit 8 is a purely combinational logic block that accomplishes this as follows. The shifter takes one 16-bit input from the Q output of register B 7 and outputs either the same value or the value shifted one bit to the left or right according the value on "shift" as described in Table 2.

For example, if the input to the shifter was 1111000011001111, then the output of the shifter would be as shown in Table 3.

#### 1.2.6 Writeback Multiplexer

Once the ALU has computed a value the main 16-bit result is captured in register C. If we want to use this value as the input to a subsequent instruction we need to write it into the register file. However, we will also want to input values into the register file from other sources. Thus, we add a writeback multiplexer 9.

CPEN 211 - Lab 5 5 of 11

| shift | Operation                                              |
|-------|--------------------------------------------------------|
| 00    | В                                                      |
| 01    | B shifted left 1-bit, least significant bit is zero    |
| 10    | B shifted right 1-bit, most significant bit, MSB, is 0 |
| 11    | B shifted right 1-bit, MSB is copy of B[15]            |

Table 2: Shift Operation Encoding

| shift | Output of shifter |
|-------|-------------------|
| 00    | 1111000011001111  |
| 01    | 1110000110011110  |
| 10    | 0111100001100111  |
| 11    | 1111100001100111  |

Table 3: Example shift operations starting with 1111000011001111

#### 1.2.7 Status Register

In Lab 7 we will add support for instructions used to implement features of C such as "if" statements. These instructions will need to know some information about the values being computed up to that point in the program. One important piece of information will be if the main 16-bit result of the ALU was exactly zero. If so, the status register 10 will be set to 1 otherwise it will be set to 0.

# 2 Example Datapath Operation

Consider the addition of two registers, R2 and R3 with the result being stored in R5. The addition takes four clock cycles. During the first cycle, readnum is set to 2 to indicate we want to read the 16-bit contents of R2 from the register file. At the same time loada is set to 1 to indicate that register A should be updated on the next rising edge of the clock (note the little triangle in the bottom left of register A indicates a clock input). During the second cycle, we set loada back to 0, set readnum to 3 to indicate we now want to read the 16-bit contents of R3. At the same time loadb is set to 1. With these control input settings on the next rising edge of the clock the contents of R3 will be copied to register B. During the third cycle loadb is set back to 0, ALUop is set to "00" to indicate addition (see Table 1 above), asel is set to zero to ensure the value in register A appears at the Ain input to the ALU. Similarly, bsel is set to 0 to indicate the output of the shifter unit appears at the Bin input to the ALU. The shifter unit is combinational logic that takes the 16-bit contents of B as its input and outputs the value either unmodified, or shifted to the left or right by one bit position depending upon the control input "shift". During this cycle the shift input is set to "00" to indicate the value in B should not be shifted. Also during this cycle, loadc is set to 1 to ensure the result of the addition is saved in register C on the next rising edge of the clock. We can optionally set loads to 1 if we want to record the "status" of the computation. In this lab the status will simply indicate if the 16-bit result of the ALU was zero. In later labs we will see how this status information can be used to help implement "if" statements and "for" loops in a language like C or Java. During the fourth cycle, loadc is set back to 0, vsel is set to 0, write is set to 1, and writenum is set to 5. Together these cause the value in register C to be fed back and written into register R5 within the register file.

Note that during the fourth cycle, the value fed back also appears on the output pins datapath\_out. For this lab you can connect datapath\_out to the 7-segment displays on the DE1-SoC using the logic provided in lab5\_top.v. This will be the primary way to tell if your datapath is working when it is on the DE1-SoC. However, this is not the fastest or easiest way to debug your circuit.

As alluded to earlier, the basic interface between hardware and software inside a computer is through

CPEN 211 - Lab 5 6 of 11

"instructions". Each instruction tells the computer how to move and operate upon some data. Consider an instruction of the form "MOV R2, #32". This instruction would load the actual number 32 into register R2. This can be performed with the datapath as follows:

During the first cycle, assume the number 32 appears on the 8-bit signal datapath\_in (in a later lab, we will consider more realistic memory read/write strategies). During this same cycle, vsel is set to 1, write is set to 1, and the number 2 (indicating register #2) is driven on writenum. Note that this instruction can be performed in only one cycle, unlike the ADD instruction, which takes 4 cycles. This will become important in the next assignment.

If you have trouble following the above discussion, please also review the slides "Lab 5 Introduction" on Piazza which will be briefly discussed during lecture on Oct 6.

#### 3 Lab Procedure

You will get through the lab far more quickly if you break down the overall work into smaller parts and complete, compile and test (using a testbench) each one *before* moving on to the next. If you are worried you will not have time to do the entire lab then see the marking guideline in Section 4 to see how you can earn part marks.

## 3.1 Revision Control and Regression Tests

Whether you work alone or with a partner, you will save time if you learn to use a revision control system such as "git". To easily collaborate with your partner remotely use a password protected online service such as https://bitbucket.org, which should be free for groups of two. Whatever you use to collaborate with your partner it is your responsibility to make sure it is secure and private so no other students in CPEN 211 or at other schools see your code. Revision control goes "hand in hand" with a great engineering practice known as "regression testing", which you can read about online (e.g., see the wikipedia entry https: //en.wikipedia.org/wiki/Regression\_testing). Regression testing is standard practice in industry hardware development. Briefly, the idea is that any time you change something you re-run your test benches. If you follow Tip #11 in Section 7 you can make your tests "self checking" so that you do not need to inspect the waves for every test condition. If setup to do so, after rerunning each test bench you can get it to print out a "PASS/FAIL" message that tells you if any changes you have made since you last ran the all of your tests failed (you may even be able to run ModelSim from the command line). Every time you make a change to a given hardware unit (e.g., the register file or ALU), you rerun all your tests and only submit your changes to the revision control system if all your tests "pass".

#### 3.1.1 Recommended Development Sequence

You may divide up the work between partners but both students must understand the complete design well enough to code any part of it yourself as you may be asked to write similar code on the programming proficiency test (Quiz 4).

- 1. Create a new ModelSim project called lab5.
- 2. Add a file regfile.v and write synthesizable code for your register file in this file. Note that this Verilog must conform to the style guidelines. Compile regfile.v in ModelSim to catch syntax errors.
- 3. Add a file regfile\_tb.v to your project for your register file testbench module. Section 8 provides some tips on how to write good unit level testbenches and introduces some Verilog syntax for automatically checking if the output results are correct to enable what is known as regression testing.
- 4. Compile and simulate regfile\_tb.v along with regfile.v in ModelSim. Remember to use the waveform viewer. Even if everything looks OK add some internal signals from inside the register file module you defined in regfile.v to your waveform viewer and rerun the simulation to verify the internal operation is as you expect.

CPEN 211 - Lab 5 7 of 11

- 5. Debugging. In the very likely case that a signal (wire or reg) appears wrong in the waveform viewer, first find the Verilog corresponding to the hardware block that "drives" that signal. If there is an obvious error in the code for that block that can explain the exact wrong result you are seeing, then try fixing it. If there is no obvious error then you should not change the Verilog for that block! If you do make a change, remember to recompile your Verilog, restart the simulation and rerun the simulation. If your change did not fix the specific bug you were trying to fix, then undo it! This is important! If you make changes to your code that do not fix the bug they tend to make it harder to find the bug you were original interested in because the bug tends to "move around". Instead of "undo" you can also comment out the "fix" code you added so you can get it back quickly in case you do end up needing it. Now, if/when you run out of things that could be wrong with the block defining the signal that looks wrong, do your best to guess which inputs to that block could lead to this incorrect output. Then add all the input signals to the block to the waveform viewer and restart the simulation and rerun it. If one of those inputs seems wrong, repeat Step 5 starting with the block that drives that signal. See Section 7 for more debugging tips.
- 6. Once you are satisfied that your register file works in simulation, compile regfile.v in Quartus and verify you see no inferred latches warnings. After synthesis completes, view the resulting logic diagram schematic that Quartus generates using "Tools" > "Netlist Viewers" > "RTL Viewer" to verify the hardware looks as you expect (e.g., combinational logic or flip-flops).
- 7. (Optional) Download the register file to your DE1-SoC and connect it to some top level signals. This step is time consuming. Do this step only if you have time or encounter bugs in Step 13.
- 8. Only after you have debugged the register file should you go through Steps 2-6 for the ALU.
- 9. Only after you have debugged the ALU should you go through Steps 2-6 for the shifter.
- 10. Now that all three main datapath modules are trusted to work, instantiate them in your datapath and add the remaining building blocks. Instantiate each of the three units (Register file 1), ALU 2 and Shifter 8) inside datapath.v. Then, add in the remaining logic blocks 34567910 to your datapath module using synthesizable Verilog that conforms to the style guidelines. Use no fewer than one always block or assign statement per hardware block in Figure 1. Register A, B, and C will each require an instantiated flip-flop module and an assign statement for the enable input in order to conform to the style guidelines.
- 11. Write a top level testbench for your datapath in datapath\_tb.v. It should implement at least the sequence shown below:

```
MOV R0, #7 ; this means, take the absolute number 7 and store it in R0 MOV R1, #2 ; this means, take the absolute number 2 and store it in R1 ADD R2, R1, R0, LSL#1 ; this means R2 = R1 + (R0 \text{ shifted left by } 1) = 2+14=16
```

- 12. Test the overall datapath in ModelSim. If you see any suspicious outputs you should follow the debugging procedure in (b) to find relevant internal signals to add to the waveform viewer and restart and rerun the simulation.
- 13. Only after your overall design is working in ModelSim should you compile your top level and attempt to download to your DE1-SoC. Use lab5\_top.v ONLY to help with this step. If you encounter bugs here try step (d). If you still are not sure what is going on and why the results differ from ModelSim then modify lab5\_top.v to connect internal signals within your datapath to the LEDs on your DE1-SoC to help you follow the debugging rule "Quit Thinking and Look".

CPEN 211 - Lab 5 8 of 11

# 4 Marking Scheme

A reminder that *both* partners must be in attendance during the demo. Any partner who is absent will automatically receive a mark of zero even if they did their fair share of the work.

Your mark will be computed as the sum of the following. Partners may get a different mark based upon their ability to answer the TA's questions. If it becomes apparently one partner did more than two thirds of the work the partner who did less will receive a mark of zero. You **must** include at least one line of comments per always block, assign statement or module instantiation and in test benches you must include one line of comments per test case saying what the test is for and what the expected outcome is.

- **3 Marks** One mark for each of regfile.v, alu.v and shifter.v. Up to half of the marks may be deducted here for violations of the style guidelines or for lack of comments.
- **3 Marks** For each of the unit level testbenches for your ALU, shifter and the register file. To receive full marks you must test both basic functionality and some "corner cases". A corner case is an input you are not expecting to see or not expecting to see often. You may lose up to 2 marks if you test cases are not commented (one line per test case saying what is being tested and the expected outcome).
- 1 Marks For your datapath.v. Your Verilog must both be completed and conform to the style guidelines to receive this mark. Again, marks will be deducted for violations of the style guidelines or for lack of comments.
- **1 Marks** For your datapath\_tb.v. You should at least test the sequence described in Section 3.1(k).
- **2 Marks** Demonstrate your datapath works on your DE1-SoC using a test case of your own devising. Your TA will need to be convinced your design really works to get full marks. Note that the .sof file used for this may be regenerated before your demo either by the TA or an automated script using your Quartus Project File. Hence, you will automatically get zero marks for this part if your .zip file does not contain a Quartus Project File.

#### 5 Lab Submission

Remember the lab code including all files created by ModelSim and Quartus must be submitted by the deadline shown on Page 1 of this handout the day before your lab section using handin. Use the same procedure outlined at the end of the Lab 3 handout except that now you are submitting Lab 5, so use:

handin cpen211 Lab5-<section>

where <section> should be replaced by your lab section. Remember you can overwrite previous and trial submissions using -o.

A short (15 minute) video walking through how to submit remotely, e.g., from a laptop or home computer running Windows can be found here https://youtu.be/bxr5dq0xHzc (this was recorded for Lab 3—make sure you make appropriate changes for Lab 5).

To avoid losing marks be sure your lab5.zip file include *all* files including files created by ModelSim and Quartus. This should include Verilog files you created (.v and possibly .sv) for both synthesizable and testbench code, waveform format files (.do), your Quartus Project file (.qpf), and ModelSim Project File (.mpf). We may run a script to automatically regenerate .sof files based upon the Quartus Project File, so include only a single Quartus Project File in your .zip file. You should be able to unzip your .zip file on a different computer and find all files needed by your Quartus and ModelSim projects (the ModelSim project file may not work due to the way it stores filenames, but you must include it in your .zip file).

CPEN 211 - Lab 5 9 of 11

#### 6 Lab Demonstration Procedure

To reduce congestion in the lab we will be dividing each lab section into two one hour sessions. For example, for L1A the first session will run from 9 am to 10 am and the second session will run from 10 am to 11 am. We request that you show up no more than 10 minutes before the start of your assigned one hour "Lab 5 Marking Time", which will be posted on Connect at least 24 hours before your lab section along with your "Lab 5 TA". The TAs will have a randomly ordered list of lab partners and will start working their way down the list marking (look for "Lab 5 Marking Order" on Connect to get a rough idea, but be sure you are in the lab at the beginning in case the TA follows a different order). If you and your partner are not present when they ask to mark you and you have not told them where you are beforehand, your name will be put to the end of the list. If this happens the TA will be under no obligation to mark you, but may do so at their own discretion and if time permits.

Your TA should have your submitted code with them and have setup a "TA marking station" where you will go when it is your turn to be marked. However, we still require that you bring your DE1-SoC, submitted code, and (if you have one) laptop to MCLD 112. If they ask you to demo using your own workstation or laptop, then you **must** demo the exact same code you submitted via handin (so be sure you have a copy of that code with you). Note that the TAs are going to be instructed to avoid doing this for Lab 5, 6 and 7 as the system for getting them the code in time for the lab has been working perfectly since Lab 3 (the only issue has been some TAs not knowing how to copy the files to their own directories or laptop and some TAs leaving it to the last minute to get prepared to mark).

# 7 More Debugging Tips

- 1. Rule 1 "Understand the System". Read the entire handout at least once.
- 2. Rule 2 "Make it Fail". Think of each test case or test vector like asking a question. The input is the question and the output is the answer. Which questions should you "ask"? Start with testing each basic feature. But good tests ask tough questions. You want to make your register file or ALU or shifter fail with your testbench so that you can fix it before combining it with the other units. A fun warm up exercise: http://www.nytimes.com/interactive/2015/07/03/upshot/a-quick-puzzle-to-test-your-problem-solving.html? r=0
- 3. Rule 3 "Quit thinking and look". Add internal signals to your waveform viewer to verify your theory about what is wrong BEFORE changing any Verilog.
- 4. Rule 4 "Divide and conquer". For example, how can you verify you correctly write a value into the register file? How would you know if it was written correctly without reading the value out again? Again, the only way is to look at the internal value of the 16-bit registers. If you just try a test that writes and then reads and it doesn't work, you won't know which part (reading or writing) is broken.
- 5. Rule 5 "Change One Thing at a Time". If your fix does not work, undo it!
- 6. Rule 6 "Keep an audit trail." Especially if you are tired, write down what you are trying or the things you are observing so you don't forget.
- 7. Rule 7 "Check the plug": After hours of debugging it is not uncommon to hear someone say "Why didn't I check that first?"
- 8. Rule 8 "Get a Fresh Perspective": If you are really stuck call your authorized partner. If you are both stuck go to TA or instructor office hours or make an appointment with the instructor. Please note that the TAs are not paid to meet with you outside of lab or regularly scheduled office hours. If they do this it is purely voluntary they are not full time employees of the university. They are students a bit more senior than you are.

- 9. Rule 9 "If you didn't fix it it ain't fixed". If you think you fixed it, undo your change and verify the bug comes back. Oftentimes a change only appears to make the bug go away.
- 10. In your top level testbench datapath\_tb.v, you may want to use Verilog hierarchical path names to simplify checking whether signals internal to the datapath are acting as you expect. A hierarchical path name is the name of the top level module follows by the module instance names separated by periods. For example, consider code for a binary select multiplexer in Figure 8.14 of Dally which has 3-bit internal signal "s" connecting the output of a decoder to a one-hot select mux. If an instance of Muxb3 was instantiated inside your datapath with label MUX1, and the datapath is called DUT inside your testbench module which is called datapath\_tb, you could print out the value of the signal "s" inside of the mux using the following line inside the script part of datapath tb.v:

```
$display("%b", datapath tb.DUT.MUX1.s);
```

11. Make your testbenches self checking. There are two approaches. One is simply to use an error signal and use if conditions (see test\_q84 solution from Problem Set #2). Alternatively, if you save your testbench file with the extension .sv and set the file properties to SystemVerilog you can make your testbench "self checking" by using the SystemVerilog assert statement. Continuing the above example, if we expect "s" to be 3'b100 at some point during the test script then we could write:

```
assert (datapath_tb.DUT.MUX1.s == 3'b100) $display("PASS");
  else $error("FAIL");
```

If you want simulation to stop on an error go to "Simulate > Runtime Options..." then select the "Message Severity" tab and change the setting for "Break Severity" to "Error".

12. The following Verilog "force" and "release" syntax can be helpful for debugging after you put your datapath together if you later find a bug. In ModelSim from a test script and using the above external name syntax you can override the logic value generated by the circuit itself to "inject" your own test values using the Verilog keyword "force". Continuing the example above, suppose "s" has the value "010" but you would like to see what the output "b" of instance "m" is if instead "s" was "100". You could write the following line in your Verilog testbench script to find out:

```
force datapath tb.DUT.MUX1.s = 3'b100;
```

Later in your test script you can go back to using the value generated by the circuit by using "release":

```
release datapath_tb.DUT.MUX1.s;
```

# 8 The Simple RISC Machine Instruction Set Architecture

The information in Table 4 and 5 is only relevant to Lab 6 and 7, respectively. Look at it now only if you are curious about those labs. Each row in these tables specifies a single instruction. The assembly syntax is in the leftmost column. Each instruction is encoded using 16-bits. The next 16 columns indicate the binary encoding for the instruction. The last column on the right summarizes the operation of the instruction. The most significant 3-bits of each instruction (bits 15 through 13) are the opcode which indicates which instruction or class of instruction is represented.

**Terminology quick definitions.** These will be explained in more detail in the Lab 6 and Lab 7 handouts.

- Rn, Rd, Rm are 3-bit register number specifiers.
- im8 is an 8-bit immediate operand encoded as part of the instruction.
- im5 is a 5-bit immediate operand encoded as part of the instruction.

| Assembly Syntax (see text)             |        | "Siı | npl | e RI | SC N | Iachine"  | Operation (see text) |     |           |                        |
|----------------------------------------|--------|------|-----|------|------|-----------|----------------------|-----|-----------|------------------------|
|                                        |        | 14   | 13  | 12   | 11   | 10 9 8    | 7 6 5                | 4 3 | 2 1 0     | Operation (see text)   |
| <b>Move Instructions</b>               | opcode |      |     | op   |      | <i>3b</i> | 8b                   |     |           |                        |
| MOV Rn,# <im8></im8>                   | 1      | 1    | 0   | 1    | 0    | Rn        | im8                  |     |           | R[Rn] = sx(im8)        |
| MOV Rd,Rm{, <sh_op>}</sh_op>           | 1      | 1    | 0   | 0    | 0    | 0 0 0     | Rd                   | sh  | Rm        | $R[Rd] = sh_Rm$        |
| <b>ALU Instructions</b>                | op     | осос | de  | AL   | Uop  | <i>3b</i> | <i>3b</i>            | 2b  | <i>3b</i> |                        |
| ADD Rd,Rn,Rm{, <sh_op>}</sh_op>        | 1      | 0    | 1   | 0    | 0    | Rn        | Rd                   | sh  | Rm        | $R[Rd]=R[Rn]+sh_Rm$    |
| <pre>CMP Rn,Rm{,<sh_op>}</sh_op></pre> | 1      | 0    | 1   | 0    | 1    | Rn        | 0 0 0                | sh  | Rm        | status=f(R[Rn]-sh_Rm)  |
| AND Rd,Rn,Rm{, <sh_op>}</sh_op>        | 1      | 0    | 1   | 1    | 0    | Rn        | Rd                   | sh  | Rm        | $R[Rd]=R[Rn]\&sh_Rm$   |
| MVN Rd,Rm{, <sh_op>}</sh_op>           | 1      | 0    | 1   | 1    | 1    | 0 0 0     | Rd                   | sh  | Rm        | R[Rd]= sh_Rm           |
| <b>Memory Instructions</b>             | op     | эсос | de  | AL   | Uop  | <i>3b</i> | <i>3b</i>            |     | 5b        |                        |
| LDR Rd,[Rn{,# <im5>}]</im5>            | 0      | 1    | 1   | 0    | 0    | Rn        | Rd                   | i   | .m5       | R[Rd]=M[R[Rn]+sx(im5)] |
| STR Rd,[Rn{,# <im5>}]</im5>            | 1      | 0    | 0   | 0    | 0    | Rn        | Rd                   | i   | .m5       | M[R[Rn]+sx(im5)]=R[Rd] |

Table 4: Instructions you will add in Lab 6

| Assembly Syntax (see text) | "Simple      | RISC I | Machine" | Operation (see text) |                          |
|----------------------------|--------------|--------|----------|----------------------|--------------------------|
| Assembly Symax (see text)  | 15   14   13 | 12 11  | 10 9 8   |                      |                          |
| Branch                     | opcode       |        | cond     | 8b                   |                          |
| B <label></label>          | 0 0 1        | 0 0    | 0 0 0    | im8                  | PC+=sx(im8)              |
| BEQ <label></label>        | 0 0 1        | 0 0    | 0 0 1    | im8                  | if Z=1 then PC+=sx(im8)  |
| BEQ <label></label>        | 0 0 1        | 0 0    | 0 1 0    | im8                  | if Z=0 then PC+=sx(im8)  |
| BLT <label></label>        | 0 0 1        | 0 0    | 0 1 1    | im8                  | if N/=V then PC+=sx(im8) |
| BLE <label></label>        | 0 0 1        | 0 0    | 1 0 0    | im8                  | if N/=V or Z=1 then      |
|                            |              |        |          |                      | PC+=sx(im8)              |
| Call & Return              | opcode       | op     | Rn       | 8b                   |                          |
| BL <label></label>         | 0 1 0        | 1 1    | 1 1 1    | im8                  | R7=PC; PC+=sx(im8)       |
| BLX Rd                     | 0 1 0        | 1 0    | 1 1 1    | Rd 0 0 0 0 0         | R7=PC; PC=Rd             |
| BX Rd                      | 0 1 0        | 0 0    | 0 0 0    | Rd 0 0 0 0 0         | PC=Rd                    |

Table 5: Instructions you will add in Lab 7

- <sh op> and sh are 2-bit immediate operands encoded as part of the instruction.
- sx(f) sign extends the immediate value f to 16-bits.
- Sh Rm is the value of Rm after passing through the shifter connected to the Bin input to the ALU.
- Z, V, and Z are the zero, overflow and zero flags of the status register (only Z is implemented in Lab 5).
- status refers to all three of Z, V and Z.
- R[x] refers to the 16-bit value stored in register x.
- M[x] is the 16-bit value stored in main memory (added in Lab 6) at address x.
- PC refers to the program counter register (added in Lab 6).
- <label> refers to a textual marker in the assembly that indicates an instruction address